The purpose of the report is to aggregate and examine selected techniques of imputation of missing data in the context of their impact on the prediction efficiency of classification algorithms. The following considerations include various imputation techniques, both basic (median / mode imputation) and more sophisticated (selected methods from the missForest, VIM, mice or missMDA packages).
For testing purposes, as the classification algorithm, we used the ranger algorithm, which is a fast implementation of random forests, particularly suited for high dimensional data. The prediction effectiveness was assessed in relation to the AUC, balanced accuracy and Matthews correlation coefficient measures.

The report contains, all the results, grouped by both: package and dataset.

basic (median/mode)

adult

Crossvalidation results

Imputation times

## Train set imputation time:  0.12
## Test set imputation time:  0.058

Test set results

## Test set AUC:  0.916
## Test set BACC:  0.781
## Test set MCC:  0.604

Missings overview

eucalyptus

Crossvalidation results

Imputation times

## Train set imputation time:  0.005
## Test set imputation time:  0.003

Test set results

## Test set AUC:  0.951
## Test set BACC:  0.889
## Test set MCC:  0.783

Missings overview

dresses-sales

Crossvalidation results

Imputation times

## Train set imputation time:  0.005
## Test set imputation time:  0.004

Test set results

## Test set AUC:  0.575
## Test set BACC:  0.591
## Test set MCC:  0.215

Missings overview

credit-approval

Crossvalidation results

Imputation times

## Train set imputation time:  0.005
## Test set imputation time:  0.005

Test set results

## Test set AUC:  0.93
## Test set BACC:  0.874
## Test set MCC:  0.752

Missings overview

sick

Crossvalidation results

Imputation times

## Train set imputation time:  0.03
## Test set imputation time:  0.015

Test set results

## Test set AUC:  0.996
## Test set BACC:  0.927
## Test set MCC:  0.898

Missings overview

SpeedDating

Crossvalidation results

Imputation times

## Train set imputation time:  0.089
## Test set imputation time:  0.043

Test set results

## Test set AUC:  1
## Test set BACC:  0.98
## Test set MCC:  0.976

Missings overview

jobs

Crossvalidation results

Imputation times

## Train set imputation time:  0.588
## Test set imputation time:  0.085

Test set results

## Test set AUC:  0.735
## Test set BACC:  0.55
## Test set MCC:  0.199

Missings overview

cylinder-bands

Crossvalidation results

Imputation times

## Train set imputation time:  0.011
## Test set imputation time:  0.009

Test set results

## Test set AUC:  0.903
## Test set BACC:  0.667
## Test set MCC:  0.459

Missings overview

At the end are plots summarizing results of all packets tested on all datasets. Probably some commentary too.

missRanger

adult

Crossvalidation results

Imputation times

## Train set imputation time:  22.678
## Test set imputation time:  3.703

Test set results

## Test set AUC:  0.915
## Test set BACC:  0.78
## Test set MCC:  0.604

Missings overview

eucalyptus

Crossvalidation results

Imputation times

## Train set imputation time:  0.535
## Test set imputation time:  0.181

Test set results

## Test set AUC:  0.959
## Test set BACC:  0.883
## Test set MCC:  0.769

Missings overview

dresses-sales

Crossvalidation results

Imputation times

## Train set imputation time:  0.404
## Test set imputation time:  0.109

Test set results

## Test set AUC:  0.587
## Test set BACC:  0.57
## Test set MCC:  0.149

Missings overview

credit-approval

Crossvalidation results

Imputation times

## Train set imputation time:  0.483
## Test set imputation time:  0.172

Test set results

## Test set AUC:  0.919
## Test set BACC:  0.846
## Test set MCC:  0.695

Missings overview

sick

Crossvalidation results

Imputation times

## Train set imputation time:  1.58
## Test set imputation time:  0.487

Test set results

## Test set AUC:  0.996
## Test set BACC:  0.917
## Test set MCC:  0.886

Missings overview

SpeedDating

Crossvalidation results

Imputation times

## Train set imputation time:  79.609
## Test set imputation time:  17.848

Test set results

## Test set AUC:  1
## Test set BACC:  0.977
## Test set MCC:  0.972

Missings overview

jobs

Crossvalidation results

Imputation times

## Train set imputation time:  568.33
## Test set imputation time:  58.636

Test set results

## Test set AUC:  0.736
## Test set BACC:  0.551
## Test set MCC:  0.205

Missings overview

cylinder-bands

Crossvalidation results

Imputation times

## Train set imputation time:  1.144
## Test set imputation time:  0.44

Test set results

## Test set AUC:  0.933
## Test set BACC:  0.872
## Test set MCC:  0.768

Missings overview

At the end are plots summarizing results of all packets tested on all datasets. Probably some commentary too.